{
 "cells": [
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "4baa8c35-0651-40c0-b765-729b02bc55be",
   "metadata": {},
   "outputs": [],
   "source": [
    "from langchain_community.document_loaders import PyPDFLoader"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "3c492a51-b4fb-46d9-a573-86a4242b0f86",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "Document(page_content=\"By using digital twin t echnology , manufactur ers can monit or, simulat e, and optimize the\\nperformance o f their assets in r eal-time, as w ell as t est new scenar ios and solutions befor e\\nimplementing them in the r eal w orld - By K.A. Gerar dino.\\nImagine a fact ory wher e you can monit or and contr ol ev ery aspect o f the pr oduction\\nprocess, fr om the machines' design and per formance t o the pr oduct' s quality and\\nefficiency . How about a fact ory wher e you can t est and optimize differ ent scenarios,\\nsuch as changing the lay out, adding new equipment, or adjusting the p aramet ers,\\nwithout affecting the actual operations? Now , try and visualize a fact ory wher e you\\ncan det ect and pr event pr oblems befor e they cause downtime, wast e, or defects. This\\nis not a fantasy , but a r eality made possible by digital twin\\nDigital twin in a fact ory is a concept that inv olves creating a vir tual r epresentation o f a\\nphysical asset, such as a machine, a pr ocess, or a building, using data fr om sensor s or\\nother sour ces. By using digital twin t echnology , manufactur ers can monit or, simulat e,\\nand optimize the per formance o f their assets in r eal-time, as w ell as t est new\\nscenarios and solutions befor e implementing them in the r eal w orld. Digital twins in\\nfactories can help impr ove efficiency , quality , safety , and innov ation in manufacturing\\nDIGIT AL T WIN IN THE F ACTORY: A\\nREALIT Y MADE POSSIBLE\\nIndustr y USA\\nStay up-t o-dat e with the lat est in engineering. This p age is\\na one-st op sour ce for all t echnical insights.\\n发布日期: 2023 年 10 月 11 日+ 关注\", metadata={'source': 'data/DIGITAL-TWIN.pdf', 'page': 0})"
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "loader = PyPDFLoader(\"data/DIGITAL-TWIN.pdf\")\n",
    "pages = loader.load_and_split()\n",
    "pages[0]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "4d720a4f-0eec-443f-9a93-57d189707190",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "By using digital twin t echnology , manufactur ers can monit or, simulat e, and optimize the performance o f their assets in r eal-time, as w ell as t est new scenar ios and solutions befor e implementing them in the r eal w orld - By K.A. Gerar dino. Imagine a fact ory wher e you can monit or and contr ol ev ery aspect o f the pr oduction process, fr om the machines' design and per formance t o the pr oduct' s quality and efficiency . How about a fact ory wher e you can t est and optimize differ ent scenarios, such as changing the lay out, adding new equipment, or adjusting the p aramet ers, without affecting the actual operations? Now , try and visualize a fact ory wher e you can det ect and pr event pr oblems befor e they cause downtime, wast e, or defects. This is not a fantasy , but a r eality made possible by digital twin Digital twin in a fact ory is a concept that inv olves creating a vir tual r epresentation o f a physical asset, such as a machine, a pr ocess, or a building, using data fr om sensor s or other sour ces. By using digital twin t echnology , manufactur ers can monit or, simulat e, and optimize the per formance o f their assets in r eal-time, as w ell as t est new scenarios and solutions befor e implementing them in the r eal w orld. Digital twins in factories can help impr ove efficiency , quality , safety , and innov ation in manufacturing DIGIT AL T WIN IN THE F ACTORY: A REALIT Y MADE POSSIBLE Industr y USA Stay up-t o-dat e with the lat est in engineering. This p age is a one-st op sour ce for all t echnical insights. 发布日期: 2023 年 10 月 11 日+ 关注\n"
     ]
    }
   ],
   "source": [
    "fixed_text = pages[0].page_content.replace(\"\\n\", \" \")\n",
    "print(fixed_text)\n",
    "pages[0].page_content = fixed_text"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "id": "002e794e-1603-42fb-aea0-40f494e0f601",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "1\n",
      "By using digital twin t echnology , manufactur ers can monit or, simulat e, and optimize the performance o f their assets in r eal-time, as w ell as t est new scenar ios and solutions befor e implementing them in the r eal w orld - By K.A. Gerar dino. Imagine a fact ory wher e you can monit or and contr ol ev ery aspect o f the pr oduction process, fr om the machines' design and per formance t o the pr oduct' s quality and efficiency . How about a fact ory wher e you can t est and optimize differ ent scenarios, such as changing the lay out, adding new equipment, or adjusting the p aramet ers, without affecting the actual operations? Now , try and visualize a fact ory wher e you can det ect and pr event pr oblems befor e they cause downtime, wast e, or defects. This is not a fantasy , but a r eality made possible by digital twin Digital twin in a fact ory is a concept that inv olves creating a vir tual r epresentation o f a physical asset, such as a machine, a pr ocess, or a building, using data fr om sensor s or other sour ces. By using digital twin t echnology , manufactur ers can monit or, simulat e, and optimize the per formance o f their assets in r eal-time, as w ell as t est new scenarios and solutions befor e implementing them in the r eal w orld. Digital twins in factories can help impr ove efficiency , quality , safety , and innov ation in manufacturing DIGIT AL T WIN IN THE F ACTORY: A REALIT Y MADE POSSIBLE Industr y USA Stay up-t o-dat e with the lat est in engineering. This p age is a one-st op sour ce for all t echnical insights. 发布日期: 2023 年 10 月 11 日+ 关注\n"
     ]
    }
   ],
   "source": [
    "from langchain_text_splitters import RecursiveCharacterTextSplitter\n",
    "from langchain_text_splitters import CharacterTextSplitter\n",
    "\n",
    "c_splitter = CharacterTextSplitter(\n",
    "    chunk_size=200,\n",
    "    chunk_overlap=20\n",
    ")\n",
    "# docs = c_splitter.split_documents(pages)\n",
    "# print(type(docs[0]))\n",
    "# print(len(docs))\n",
    "# print(docs[:5])\n",
    "texts = c_splitter.split_text(fixed_text)\n",
    "print(len(texts))\n",
    "for t in texts:\n",
    "    print(t)\n",
    "\n",
    "# r_splitter = RecursiveCharacterTextSplitter(\n",
    "#     chunk_size=200,\n",
    "#     chunk_overlap=20\n",
    "# )\n",
    "# rdocs = r_splitter.split_documents(pages)\n",
    "# print(rdocs[0])"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "40f9ed66-ed6c-4fac-97f9-0073558015b3",
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.10.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
